Automated Noise Detection in a Database Based on a Combined Method

نویسندگان

چکیده

Data quality has diverse dimensions, from which accuracy is the most important one. cleaning one of preprocessing steps in data mining consists detecting errors and repairing them. Noise a common type error, that occur database. This paper proposes an automated method based on k-means clustering for noise detection. At first, each attribute (Aj) temporarily removed applied to other attributes. Thereafter, k-nearest neighbors used cluster. After value predicted Aj record by nearest neighbors. The proposed detects noisy attributes using values. Our able identify several noises record. In addition, this can detect fields with different types, too. Experiments show averagely 92% existing data. compared detection association rules. results indicate have improved 13%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

task-based language teaching in iran: a mixed study through constructing and validating a new questionnaire based on theoretical, sociocultural, and educational frameworks

جنبه های گوناگونی از زندگی در ایران را از جمله سبک زندگی، علم و امکانات فنی و تکنولوژیکی می توان کم یا بیش وارداتی در نظر گرفت. زبان انگلیسی و روش تدریس آن نیز از این قاعده مثتسنی نیست. با این حال گاهی سوال پیش می آید که آیا یک روش خاص با زیر ساخت های نظری، فرهنگی اجتماعی و آموزشی جامعه ایرانی سازگاری دارد یا خیر. این تحقیق بر اساس روش های ترکیبی انجام شده است.پرسش نامه ای نیز برای زبان آموزان ...

a study on rate making and required reserves determination in reinsurance market: a simulation

reinsurance is widely recognized as an important instrument in the capital management of an insurance company as well as its risk management tool. this thesis is intended to determine premium rates for different types of reinsurance policies. also, given the fact that the reinsurance coverage of every company depends upon its reserves, so different types of reserves and the method of their calc...

A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing

Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistics, Optimization and Information Computing

سال: 2021

ISSN: ['2310-5070', '2311-004X']

DOI: https://doi.org/10.19139/soic-2310-5070-879